Overview

Column

Introduction

In this study, we will be diving into the different statistics, salaries, and other factors of specific National Football League (NFL) players and teams that end up leading to overall success.

By gathering data across several years, we can evaluate which stats impact team success the most and which stats are “overrated”. The goal is to point out where the league is overlooking certain elements that could lead to winning and show what factors have been correctly evaluated.

Research Questions

How does a Quarterback’s salary correlate to the team’s success?

Which positions are the most valuable on a football team?

Is “home field advantage” a competitive advantage in the NFL?

Column

Background

Overview of Football, the Positions, and the NFL

Football (American Football, specifically), is a popular team sport characterized by two teams of eleven players each competing to advance the ball by means of passing, running, or kicking the ball. The main goal of the game is to outscore the opposing team. Without getting to complicated, the standrd ways to score in a football game are:

  1. Touchdown (TD): Achieved by carrying the ball into the opponent’s endzone. Worth 6 points.
  2. Extra Point (PAT): Only able to be scored immediately following a touchdown by kicking the ball through the uprights. Worth 1 point.
  3. Two-Point Conversion: Other option to score immediately after a touchdown by carrying the ball into the other team’s endzone with only one chance. Worth 2 points.
  4. Field Goal (FG): By kicking a ball through the uprights.
  • Positions in Football:
    • Offense:
      • Quarterback (QB): Leads the offense, responsible for passing and, occasionally, running the ball.
      • Running Back (RB): Runs with the ball, also catches passes and blocks.
      • Wide Receiver (WR): Catches passes from the quarterback.
      • Tight End (TE): Blocks and catches passes.
      • Offensive Linemen (OT, OG, C): Protect the quarterback and create running lanes for the running back.
    • Defense:
      • Defensive Linemen (DE, DT): Rush the quarterback, stop the run.
      • Linebackers (OLB, ILB): Tackle runners, cover receivers, blitz the quarterback.
      • Cornerbacks (CB): Cover wide receivers and intercept passes.
      • Safeties (FS, SS): Defend against deep passes and support against the run.
    • Special Teams:
      • Kicker (K): Kicks field goals and extra points.
      • Punter (P): Kicks the ball to the opposing team after a failed offensive drive.
      • Long Snapper: Snaps the ball to the punter or holder for kicks.
      • Return Specialists: Return punts and kickoffs for yards.

The NFL (National Football League): The NFL is the premier professional football league in the United States, consisting of 32 teams divided into two conferences: the National Football Conference (NFC) and the American Football Conference (AFC). Each conference is further divided into four divisions. The NFL season typically runs from September to February and includes a 17-week regular season followed by the playoffs and the Super Bowl, which determines the league champion.

The NFL attracts millions of fans worldwide and is known for its high level of competition, star players, and iconic teams. The league generates significant revenue through television contracts, merchandise sales, and ticket sales, making it one of the most lucrative sports leagues globally. The NFL also plays a significant role in American culture, with football Sundays becoming a tradition for many fans to gather and watch games with friends and family.

Data

Column

Explanation

In this section, I will give some explanations to some of the data I’ve chosen to use. I feel this is important because it will show how the different variables connect, and how they will help me in answering my research questions.

Column

Team Stats

Team Name: Name of the NFL team in question.

Home Team / Away Team: For a specific game, this variable tells us which team was at home or away.

Score Home / Score Away: For a specific game, the number of points scored by either team.

Winning Percentage: For each team, knowing their winning percentage can help to find correlations between success and other variables I will use later on.

Column

Player Stats

Player Name: The name of the player in question.

Position: The position of a particular player, this is important because in football, players usually have a specific job on the field. This can tell us whether it is an offensive or defensive player, how much they likely have the ball in a game, or how much they would be on the field.

Team: The team that a player is playing for when they acquired the stats.

Passing Stats: Used to determine the stats of a QB throwing the ball.

Rushing Stats: Stats that show how well a RB or QB can run the ball during a play.

Receiving Stats: Used to determine the performance of pass catchers (RBs, WRs, and TEs)

Defensive Stats: Stats for either individual players, or an entire defense of a team. Since individual defensive players don’t get graded on fantasy points, defensive stats are going to be the least utilized in this presentation.

Fantasy Points: This is a stat that is generated for each player during every game they play. It takes into account positive stats, such as yards gained, touchdowns, and receptions, as well as negative stats, such as yards lost, fumbles, and interceptions. This will be a crucial stat since it is a well known way to measure a player’s impact on a game or their team.

Value & Production

Column

Salary Boxplot

Analysis

These results have a few surprises. I was not expecting that Quarterbacks (QB) would have a lower median salary than Left Tackles (LT). However, the explanation for this is likely that there are, on average, 2.65 QBs on an NFL roster while only have 1.2 LTs. This means that there is usually a starting QB, who would likely earn the most on almost every team, and 2 backup QBs who would make significantly less than the starter per year. Meanwhile, a left tackle has a very highly valued job: protecting the QB from being hit on his blind side. This combined with the fact that there is usually only one on every team, means that the median LT makes more than the median QB.

Column

Fantasy Points Explanation

Fantasy points are a measure of the total yards, receptions, touchdowns, fumbles, and interceptions for individual offensive players. It weighs each of these in a way which is meant to be able to measure a player’s performance in any game. This is useful because it gives us a way to look at how impactful a player is and measure this versus his salary. In the following scatterplots, we look at all players’ salary vs. performance as well as “impact” players. “Impact” players are the ones that have over

All PLayers

Impact Players

Analysis

These plots show that even though there is positive correlation between player salary and fantasy points per game for all offensive skill positions. However, when the salary for players get higher and higher, the stats of the player doesn’t match up with the salary as much. This is especially true with the Quarterback position, which I go into in more depth in the next tab.

Position Analysis

Column

“Skill” Positions Meaning

In football, there are certain positions that are known as “skill positions.” Essentially, these are the positions on a team that are designated for handling the ball after the snap. On offense, there are 4 skill position types. They are Quarterback, Running Back, Wide Receiver, and Tight End. Here, we look at the production of each of these positions. To get the most accurate and relevant data, I have decided to only look at the players making over the average salary for their respective positions.

Quarterbacks

Running Backs

Wide Receivers

Tight Ends

Analysis

These plots show how a position performs based on their salary. Quarterbacks are shown to be able to perform well and score a lot of fantasy points even at lower salary. This is especially true for the QBs who scored the most fantasy points. Tight Ends and Wide Receiver both have positive correlation between salary and fantasy points per game which tells us that teams that spend a lot on those positions get a lot in return while teams that spend the least on those positions won’t have as strong of a passing game (TEs and WRs are the ones who receive the most passes). Running backs are a bit of a different story. Their position is often overlooked by franchises in the NFL because they are known for having injuries that prevent future performance. Because of this, RBs typically make the least of the skill positions despite making up half of the touches in the game. To some degree, getting good value out of a running back comes down to the luck of whether or not the player can remain healthy.

Most Valuable Players

In the NFL, the Most Valuable Player (MVP) Award is given to the player that national writers and journalists vote on having the highest impact on team success in a season. This is considered to be the most prestigious award and highest honor a player could achieve within his career. In the following section, we will look into the stats of MVPs from 2014 - 2000 and see what kinds of stats lead to a player being voted MVP.

Column

Fantasy Points

Yards per Game

Total Touchdowns

Analysis

In these plots, we see that, as expected, the player who has won MVP from 2014 to 2000 has far exceeded the league average for each of the stats. One thing to also note is that almost all of the MVPs were making less than the league average starting QB salary, indicating once again that many teams overspend on QBs and don’t the expected value in return. One argument for why this is occurring is that because many of the QBs above are making lower than the league average, their teams have more money to give out to other positions which makes the team as a whole better. Even though MVP is supposed to be independent of a player’s team, in practice it is clear that the Most Valuable Player award is often given to the best player on the best team in the league.

Wins and QB Salary

Column

Winning Percentage and QB Salary

Column

Analysis

Here, we are looking how a team’s winning percentage based on the amount they pay their starting Quarterback. The overall positive trend of the plot means that paying a QB more usually leads to a better winning percentage. This somewhat refutes what we saw earlier with MVP performance and salary, which showed that the league average salary for a starting QB was higher than the salaries for MVPs. My argument for that was that allocating more money for other positions would lead to more team success but this seems to state the opposite. What this plot suggests is what many people, including myself before starting this assignment, believe about the Quarterback position: that it is the most important to winning and is therefore the most valuable.

Home Field Advantage

Column

Description

Here, we look at how home field affects winning percentage in the NFL. We will look at this both by showing teams winning games and how these percentages would be affected by factoring in betting spreads. This will also be able to tell us which teams have the best home field advantage. Since each team should play the same number of home and away games each season, home field advantage is a very important statistic.

Home Field Advantage (Reg. Season)

Column

Analysis

What this data shows is that, as expected, every team has a better record at home during the regular season since the year 2000. The reason for why some teams have a better could be due to a variety of factors. Teams such as the Vikings, Seahawks, and Ravens are known to have some of the best atmospheres and most dedicated fans in the league. In the playoffs, some of the reason for strange-looking stats are probably due to a smaller sample size. For example, the Detroit Lions have a 100% win percentage at home in the playoffs since 2000, but they have only played one home playoff game. Also, one outlier is that only one team hasn’t played a home playoff game since 2000, and that team is the Cleveland Browns.

Home Field Advantage (Playoffs)

Conclusion

Column

Conclusion

After looking over the entire presentation, a few things stand out.

  • Player success doesn’t necessarily directly depend on salary, but it is often a good indicator of a player’s ability. Since salaries for players are usually highly competitive between multiple teams, it would make sense why some players would have a high salary but a worse performance for a given year. On the flip side, rookies can only make a capped amount of money for usually 3 to 4 years depending on when they were drafted. This means that by selecting a player later in the draft that has a great output in their first few years, a team can get a lot of value in a player while spending a small amount of money.

  • Team success can be shwon to be directly impacted by a QB’s salary. Like I previously mentioned, QB’s are the most valuable position on a football field so it makes sense that they have the highest impact on team success and their salaries represent that.

  • Teams usually get what they pay for with WRs and TEs. These two positions have become used more and more in recent years with the West Coast style offense beginning to emerge in the NFL. As time goes on, I predeict that these positions salaries will become highly inflated.

  • Regular Season home field advantage doesn’t translate to the Playoffs. A common adage in professional sports is that the regular season and the postseason are two completely different games. With the bar charts we have looked at, that seems to be the case. I would have expected that teams with a home winning record that is much higher in the regular season would also translate to the playoffs but that does not seem to be the case. Whether simply due to a smaller sample size or maybe a damped effect of home field advantage in the playoffs, it is unclear why teams that perform well at home during the regular season, don’t always do as well in the postseason.

Column

Difficulties

During this project, I ran into several challenges. Some of these were likely avoidable and some of them weren’t. For one, the many team name changes and moving locations made it a struggle to correlate some of these dataframes. I was unsure why I couldn’t use the “name” variable as a key for a couple different data.frames and it took me a long time to realize why. This could have been solved much earlier by printing all the different team names in each set of data and then adjusting them until I was left with the correct number. Another issue is that the amount of data available for some of these stats can be very hard to find. I was able to find every score for every NFL game in history within minutes, but I searched for days to find the salaries of all NFL players only to find a set that ranged from 2014 - 2020. This left a lot of gaps in the data because I had far too much data in some places but not others which meant I couldn’t go into everything I originally wanted to.

About the Author

My name is Riley Barrett and at the time of this presentation I will be graduating in less than a week. I chose to do this presentation on this topic because I was interested in seeing some new things about the NFL that I hadn’t really thought about before. I have always been a big fan of the Cleveland Browns, which led me to this project initially since they have been a franchise with little success in the past 30 years.

I have worked at the Wright-Patterson AFB for almost 2 years where I have done data manipulation on all different kinds of data.

I took this class because I wanted to learn more about different ways to display and manipulate large data sets.

In the next month, I will begin a career of electrical engineering at a firm in Cleveland, OH where I can hopefully use what I learned in this class to benefit myself and my company.

---
title: "An In-Depth Look into the NFL"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswitch: default
      navbar-bg: "blue"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---


<style>
.chart-title {  /* chart_title  */
   font-size: 20px;
  }
body{  /* Normal  */
      font-size: 18px;
  }
</style>

```{r setup, include=FALSE}
# In this cell block, I am trying to get most of the datatables ready for the following graphs and tables.

library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
library(scales)
library(maps)

# Import Yearly Stats Dataset (2008 - 2023)
yearly_stats <- read_csv("yearly_data_updated_08_23.csv")
names(yearly_stats)

# Define the key for team abbreviations
team_key <- c(
  "ARI" = "Arizona Cardinals",
  "ATL" = "Atlanta Falcons",
  "BAL" = "Baltimore Ravens",
  "BUF" = "Buffalo Bills",
  "CAR" = "Carolina Panthers",
  "CHI" = "Chicago Bears",
  "CIN" = "Cincinnati Bengals",
  "CLE" = "Cleveland Browns",
  "DAL" = "Dallas Cowboys",
  "DEN" = "Denver Broncos",
  "DET" = "Detroit Lions",
  "GB" = "Green Bay Packers",
  "HOU" = "Houston Texans",
  "IND" = "Indianapolis Colts",
  "JAX" = "Jacksonville Jaguars",
  "KC" = "Kansas City Chiefs",
  "LAC" = "Los Angeles Chargers",
  "LA" = "Los Angeles Rams",
  "LV" = "Las Vegas Raiders",
  "MIA" = "Miami Dolphins",
  "MIN" = "Minnesota Vikings",
  "NE" = "New England Patriots",
  "NO" = "New Orleans Saints",
  "NYG" = "New York Giants",
  "NYJ" = "New York Jets",
  "PHI" = "Philadelphia Eagles",
  "PIT" = "Pittsburgh Steelers",
  "SF" = "San Francisco 49ers",
  "SEA" = "Seattle Seahawks",
  "TB" = "Tampa Bay Buccaneers",
  "TEN" = "Tennessee Titans",
  "WAS" = "Washington Commanders"
)

# Use key to convert abbreviations to full name.
yearly_stats$team <- team_key[yearly_stats$team]

# Import Salaries For Each Player (2014 - 2020)
salaries_by_player <- read_csv("2014-thru-2020-cap-tables.csv")
names(salaries_by_player)

# Import Weekly Scores
weekly_scores <- read_csv("spreadspoke_scores.csv")
names(weekly_scores)

# Keep only weekly scores of 2000 season or later.
weekly_scores <- weekly_scores %>%
  filter(schedule_season >= 2000)

# Normalize team names for teams that have changed names
weekly_scores$team_home <- ifelse(weekly_scores$team_home %in% c("Washington Redskins", "Washington Football Team"), "Washington Commanders", weekly_scores$team_home)
weekly_scores$team_away <- ifelse(weekly_scores$team_away %in% c("Washington Redskins", "Washington Football Team"), "Washington Commanders", weekly_scores$team_away)

# Handle the Raiders name change
weekly_scores$team_home <- ifelse(weekly_scores$team_home == "Oakland Raiders", "Las Vegas Raiders", weekly_scores$team_home)
weekly_scores$team_away <- ifelse(weekly_scores$team_away == "Oakland Raiders", "Las Vegas Raiders", weekly_scores$team_away)

# Normalize team names for teams that have changed names
weekly_scores$team_home <- ifelse(weekly_scores$team_home %in% c("St. Louis Rams"), "Los Angeles Rams", weekly_scores$team_home)
weekly_scores$team_away <- ifelse(weekly_scores$team_away %in% c("St. Louis Rams"), "Los Angeles Rams", weekly_scores$team_away)

# Handle the Raiders name change
weekly_scores$team_home <- ifelse(weekly_scores$team_home == "San Diego Chargers", "Los Angeles Chargers", weekly_scores$team_home)
weekly_scores$team_away <- ifelse(weekly_scores$team_away == "San Diego Chargers", "Los Angeles Chargers", weekly_scores$team_away)

# Combine the team names from both 'team_home' and 'team_away' into a single column
all_teams <- weekly_scores %>%
  select(team_home, team_away) %>%
  pivot_longer(cols = c(team_home, team_away), names_to = "home_away", values_to = "team_name")

# Now, find the unique team names
unique_teams <- all_teams %>%
  distinct(team_name) %>%
  arrange(team_name)  # This will sort the team names alphabetically

# Count the number of unique team names
unique_teams_count <- nrow(unique_teams)

# Print the number of unique team names
print(paste("Number of unique team names:", unique_teams_count))

# Print the list of unique team names
print("List of unique team names:")
print(unique_teams$team_name)

team_colors <- data.frame(
  team_home = c("New England Patriots", "Kansas City Chiefs", "Green Bay Packers", "New Orleans Saints", 
                "Tampa Bay Buccaneers", "Buffalo Bills", "Baltimore Ravens", "Los Angeles Rams", 
                "Seattle Seahawks", "Pittsburgh Steelers", "Minnesota Vikings", "Indianapolis Colts", 
                "Philadelphia Eagles", "Tennessee Titans", "Dallas Cowboys", "San Francisco 49ers", 
                "Houston Texans", "Arizona Cardinals", "Chicago Bears", "Las Vegas Raiders", 
                "Cleveland Browns", "Atlanta Falcons", "Carolina Panthers", "Jacksonville Jaguars", 
                "New York Giants", "Los Angeles Chargers", "Cincinnati Bengals", "Denver Broncos", 
                "Miami Dolphins", "Washington Commanders", "New York Jets", "Detroit Lions"),
  primary_color = c("#002244", "#E31837", "#203731", "#D3BC8D", "#D50A0A", "#00338D", "#241773", 
                    "#003594", "#69BE28", "#FFB612", "#4F2683", "#003D79", "#004C54", "#2C135E", 
                    "#041E42", "#AA0000", "#03202F", "#97233F", "#0B162A", "#000000", 
                    "#FF3C00", "#A71930", "#0085CA", "#006778", "#FFA300", "#002244", "#FB4F14", 
                    "#FB4F14", "#008E97", "#773141", "#125740", "#006778")
)

team_color <- data.frame(
  team = c("New England Patriots", "Kansas City Chiefs", "Green Bay Packers", "New Orleans Saints", 
                "Tampa Bay Buccaneers", "Buffalo Bills", "Baltimore Ravens", "Los Angeles Rams", 
                "Seattle Seahawks", "Pittsburgh Steelers", "Minnesota Vikings", "Indianapolis Colts", 
                "Philadelphia Eagles", "Tennessee Titans", "Dallas Cowboys", "San Francisco 49ers", 
                "Houston Texans", "Arizona Cardinals", "Chicago Bears", "Las Vegas Raiders", 
                "Cleveland Browns", "Atlanta Falcons", "Carolina Panthers", "Jacksonville Jaguars", 
                "New York Giants", "Los Angeles Chargers", "Cincinnati Bengals", "Denver Broncos", 
                "Miami Dolphins", "Washington Commanders", "New York Jets", "Detroit Lions"),
  primary_color = c("#002244", "#E31837", "#203731", "#D3BC8D", "#D50A0A", "#00338D", "#241773", 
                    "#003594", "#69BE28", "#FFB612", "#4F2683", "#003D79", "#004C54", "#2C135E", 
                    "#041E42", "#AA0000", "#03202F", "#97233F", "#0B162A", "#000000", 
                    "#FF3C00", "#A71930", "#0085CA", "#006778", "#FFA300", "#002244", "#FB4F14", 
                    "#FB4F14", "#008E97", "#773141", "#125740", "#006778")
)


# Calculate percent of home wins
percent_home_wins <- mean(weekly_scores$score_home > weekly_scores$score_away) * 100

# Calculate average winning margin for home teams
average_winning_margin_home <- mean(weekly_scores$score_home - weekly_scores$score_away)

# Subset data for playoff games
playoff_games <- subset(weekly_scores, schedule_playoff == "TRUE")

# Calculate percent of playoff home wins
percent_playoff_home_wins <- mean(playoff_games$score_home > playoff_games$score_away) * 100

# Calculate average winning margin for home teams in playoff games
average_winning_margin_home_playoffs <- mean(playoff_games$score_home - playoff_games$score_away)


```

Overview
===

Column {data-width=250}
-----------------------------------------------------------------------

### Introduction

In this study, we will be diving into the different statistics, salaries, and other factors of specific National Football League (NFL) players and teams that end up leading to overall success.

By gathering data across several years, we can evaluate which stats impact team success the most and which stats are "overrated". The goal is to point out where the league is overlooking certain elements that could lead to winning and show what factors have been correctly evaluated. 

### Research Questions

How does a Quarterback's salary correlate to the team's success?

Which positions are the most valuable on a football team?

Is "home field advantage" a competitive advantage in the NFL?

Column {data-width=750}
-----------------------------------------------------------------------

### Background

**Overview of Football, the Positions, and the NFL**

Football (American Football, specifically), is a popular team sport characterized by two teams of eleven players each competing to advance the ball by means of passing, running, or kicking the ball. The main goal of the game is to outscore the opposing team. Without getting to complicated, the standrd ways to score in a football game are:

1. Touchdown (TD): Achieved by carrying the ball into the opponent's endzone. Worth 6 points.
2. Extra Point (PAT): Only able to be scored immediately following a touchdown by kicking the ball through the uprights. Worth 1 point.
3. Two-Point Conversion: Other option to score immediately after a touchdown by carrying the ball into the other team's endzone with only one chance. Worth 2 points.
4. Field Goal (FG): By kicking a ball through the uprights.

- Positions in Football:
  - Offense:
    - Quarterback (QB): Leads the offense, responsible for passing and, occasionally, running the ball. 
    - Running Back (RB): Runs with the ball, also catches passes and blocks.  
    - Wide Receiver (WR): Catches passes from the quarterback.
    - Tight End (TE): Blocks and catches passes.
    - Offensive Linemen (OT, OG, C): Protect the quarterback and create running lanes for the running back.
  - Defense: 
    - Defensive Linemen (DE, DT): Rush the quarterback, stop the run. 
    - Linebackers (OLB, ILB): Tackle runners, cover receivers, blitz the quarterback. 
    - Cornerbacks (CB): Cover wide receivers and intercept passes. 
    - Safeties (FS, SS): Defend against deep passes and support against the run.
  - Special Teams: 
    - Kicker (K): Kicks field goals and extra points. 
    - Punter (P): Kicks the ball to the opposing team after a failed offensive drive. 
    - Long Snapper: Snaps the ball to the punter or holder for kicks. 
    - Return Specialists: Return punts and kickoffs for yards.

**The NFL (National Football League):** 
The NFL is the premier professional football league in the United States, consisting of 32 teams divided into two conferences: the National Football Conference (NFC) and the American Football Conference (AFC). Each conference is further divided into four divisions. The NFL season typically runs from September to February and includes a 17-week regular season followed by the playoffs and the Super Bowl, which determines the league champion.

The NFL attracts millions of fans worldwide and is known for its high level of competition, star players, and iconic teams. The league generates significant revenue through television contracts, merchandise sales, and ticket sales, making it one of the most lucrative sports leagues globally. The NFL also plays a significant role in American culture, with football Sundays becoming a tradition for many fans to gather and watch games with friends and family.

Data
===

Column {data-width=300}
---

### Explanation
In this section, I will give some explanations to some of the data I've chosen to use. I feel this is important because it will show how the different variables connect, and how they will help me in answering my research questions.

Column {data-width=350}
---

### Team Stats

**Team Name:** Name of the NFL team in question.

**Home Team / Away Team:** For a specific game, this variable tells us which team was at home or away.

**Score Home / Score Away:** For a specific game, the number of points scored by either team.

**Winning Percentage:** For each team, knowing their winning percentage can help to find correlations between success and other variables I will use later on.


Column {data-width=350}
---

### Player Stats

**Player Name:** The name of the player in question.

**Position:** The position of a particular player, this is important because in football, players usually have a specific job on the field. This can tell us whether it is an offensive or defensive player, how much they likely have the ball in a game, or how much they would be on the field.

**Team:** The team that a player is playing for when they acquired the stats.

**Passing Stats:** Used to determine the stats of a QB throwing the ball.

**Rushing Stats:** Stats that show how well a RB or QB can run the ball during a play.

**Receiving Stats:** Used to determine the performance of pass catchers (RBs, WRs, and TEs)

**Defensive Stats:** Stats for either individual players, or an entire defense of a team. Since individual defensive players don't get graded on fantasy points, defensive stats are going to be the least utilized in this presentation.

**Fantasy Points:** This is a stat that is generated for each player during every game they play. It takes into account positive stats, such as yards gained, touchdowns, and receptions, as well as negative stats, such as yards lost, fumbles, and interceptions. This will be a crucial stat since it is a well known way to measure a player's impact on a game or their team.



Value & Production
===

Column {data-width=500}
---

### Salary Boxplot

```{r boxplot of salary}
# Creating the boxplot for salaries by position
ggplot(salaries_by_player, aes(x = pos, y = cap_hit)) +
  geom_boxplot() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Distribution of Salaries by Position",
       x = "Position",
       y = "Salary ($)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

```

### Analysis

These results have a few surprises. I was not expecting that Quarterbacks (QB) would have a lower median salary than Left Tackles (LT). However, the explanation for this is likely that there are, on average, 2.65 QBs on an NFL roster while only have 1.2 LTs. This means that there is usually a starting QB, who would likely earn the most on almost every team, and 2 backup QBs who would make significantly less than the starter per year. Meanwhile, a left tackle has a very highly valued job: protecting the QB from being hit on his blind side. This combined with the fact that there is usually only one on every team, means that the median LT makes more than the median QB.

Column {.tabset data-width=500}
---

### Fantasy Points Explanation

Fantasy points are a measure of the total yards, receptions, touchdowns, fumbles, and interceptions for individual offensive players. It weighs each of these in a way which is meant to be able to measure a player's performance in any game. This is useful because it gives us a way to look at how impactful a player is and measure this versus his salary. In the following scatterplots, we look at all players' salary vs. performance as well as "impact" players. "Impact" players are the ones that have over

### All PLayers

```{r}
# Make new dataframe to look at performance and salary
value_data <- inner_join(salaries_by_player, yearly_stats, by = c("name", "season")) %>%
  select(-team.x,pos,id,season_type) %>%
  rename(team = team.y) %>%
  left_join(team_color, by = "team") %>%
  filter(fantasy_points_ppr > 0) %>%
  mutate(ppr_per_game = fantasy_points_ppr / games)

# Scatterplot of data for all positions
ggplot(value_data, aes(y = cap_hit, x = ppr_per_game, color = position)) +
  geom_point(alpha = 0.6) +
  geom_smooth(se = F) +
  scale_color_manual(values = c("blue","red","green","purple")) +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Fantasy Points per Game vs. Salary",
       y = "Salary ($)",
       x = "Fantasy Points per Game") +
  theme_minimal() +
  theme(axis.text.x = element_text()) 
```

### Impact Players

```{r}
# Filter out players making less than $5M in a season
high_value_players <- value_data %>% 
  filter(cap_hit > 5000000, ppr_per_game > 10)

# Scatterplot of data for impact players
ggplot(high_value_players, aes(y = cap_hit, x = ppr_per_game, color = position)) +
  geom_point(alpha = 0.7) +
  geom_smooth(se = F) +
  scale_color_manual(values = c("blue","red","green","purple")) +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Fantasy Points per Game vs. Salary",
       y = "Salary ($)",
       x = "Fantasy Points per Game") +
  theme_minimal() +
  theme(axis.text.x = element_text())
```

### Analysis

These plots show that even though there is positive correlation between player salary and fantasy points per game for all offensive skill positions. However, when the salary for players get higher and higher, the stats of the player doesn't match up with the salary as much. This is especially true with the Quarterback position, which I go into in more depth in the next tab.




Position Analysis
===

Column {.tabset}
---

**"Skill" Positions Meaning**

In football, there are certain positions that are known as "skill positions." Essentially, these are the positions on a team that are designated for handling the ball after the snap. On offense, there are 4 skill position types. They are Quarterback, Running Back, Wide Receiver, and Tight End. Here, we look at the production of each of these positions. To get the most accurate and relevant data, I have decided to only look at the players making over the average salary for their respective positions.

### Quarterbacks

```{r}
# Filter to only include quarterbacks
quarterbacks <- value_data %>%
  filter(position == "QB")

average_salary_qb <- mean(quarterbacks$cap_hit, na.rm = TRUE)

# Keep only the highest salary qbs
high_salary_qbs <- quarterbacks %>%
  filter(cap_hit > average_salary_qb)

# Quarterback plot
ggplot(high_salary_qbs, aes(y = cap_hit, x = ppr_per_game)) +
  geom_point(aes(color = primary_color), size = 3, show.legend = FALSE) +
  geom_smooth(se = FALSE, color = "blue") +
  scale_color_identity() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Fantasy Points per Game vs. Salary for High-Salary QBs",
       y = "Salary ($)",
       x = "Fantasy Points per Game") +
  theme_minimal()
```


### Running Backs

```{r}
# Filter to only include running backs
runningbacks <- value_data %>%
  filter(position == "RB")

average_salary_rb <- mean(runningbacks$cap_hit, na.rm = TRUE)

# Keep only the highest salary rbs
high_salary_rbs <- runningbacks %>%
  filter(cap_hit > average_salary_rb)

# Running Back plot
ggplot(high_salary_rbs, aes(y = cap_hit, x = ppr_per_game)) +
  geom_point(aes(color = primary_color), size = 3, show.legend = FALSE) +
  geom_smooth(se = FALSE, color = "red") +
  scale_color_identity() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Fantasy Points per Game vs. Salary for High-Salary QBs",
       y = "Salary ($)",
       x = "Fantasy Points per Game") +
  theme_minimal()
```

### Wide Receivers

```{r}
# Filter to only include wide receivers
widereceivers <- value_data %>%
  filter(position == "WR")

average_salary_wr <- mean(widereceivers$cap_hit, na.rm = TRUE)

# Keep only the highest salary wrs
high_salary_wrs <- widereceivers %>%
  filter(cap_hit > average_salary_wr)

# Wide Receivers plot
ggplot(high_salary_wrs, aes(y = cap_hit, x = ppr_per_game)) +
  geom_point(aes(color = primary_color), size = 3, show.legend = FALSE) +
  geom_smooth(se = FALSE, color = "purple") +
  scale_color_identity() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Fantasy Points per Game vs. Salary for High-Salary QBs",
       y = "Salary ($)",
       x = "Fantasy Points per Game") +
  theme_minimal()
```

### Tight Ends

```{r}
# Filter to only include tight ends
tightends <- value_data %>%
  filter(position == "TE")

average_salary_te <- mean(tightends$cap_hit, na.rm = TRUE)

# Keep only the highest salary qbs
high_salary_tes <- tightends %>%
  filter(cap_hit > average_salary_te)

# Tight ends plot
ggplot(high_salary_tes, aes(x = ppr_per_game, y = cap_hit)) +
  geom_point(aes(color = primary_color), size = 3, show.legend = FALSE) +
  geom_smooth(se = FALSE, color = "green") +
  scale_color_identity() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Fantasy Points per Game vs. Salary for High-Salary TEs",
       y = "Salary ($)",
       x = "Fantasy Points per Game") +
  theme_minimal()
```

### Analysis

These plots show how a position performs based on their salary. Quarterbacks are shown to be able to perform well and score a lot of fantasy points even at lower salary. This is especially true for the QBs who scored the most fantasy points. Tight Ends and Wide Receiver both have positive correlation between salary and fantasy points per game which tells us that teams that spend a lot on those positions get a lot in return while teams that spend the least on those positions won't have as strong of a passing game (TEs and WRs are the ones who receive the most passes). Running backs are a bit of a different story. Their position is often overlooked by franchises in the NFL because they are known for having injuries that prevent future performance. Because of this, RBs typically make the least of the skill positions despite making up half of the touches in the game. To some degree, getting good value out of a running back comes down to the luck of whether or not the player can remain healthy.


Most Valuable Players
===


In the NFL, the Most Valuable Player (MVP) Award is given to the player that national writers and journalists vote on having the highest impact on team success in a season. This is considered to be the most prestigious award and highest honor a player could achieve within his career. In the following section, we will look into the stats of MVPs from 2014 - 2000 and see what kinds of stats lead to a player being voted MVP.

Column {.tabset}
---

### Fantasy Points

```{r}
# Load the MVPs from 2014 - 2020
mvp_winners <- data.frame(
  name = c("Lamar Jackson", "Patrick Mahomes", "Aaron Rodgers", "Matt Ryan", "Cam Newton", "Tom Brady", "Aaron Rodgers"),
  season = c(2019, 2018, 2014, 2016, 2015, 2017, 2020)
)

# Make new dataframe for mvps
mvp_data <- value_data %>%
  inner_join(mvp_winners, by = c("name", "season")) %>%
  mutate(yds_per_game = total_yards / games) %>%
  select(name, position, total_tds, yds_per_game, ppr_per_game, cap_hit, primary_color)

# Find the league averages for the stats
league_averages <- high_salary_qbs %>%
  summarise(
    average_touchdowns = mean(total_tds, na.rm = TRUE),
    average_yards_per_game = mean(total_yards / games, na.rm = TRUE),
    average_fantasy_points_per_game = mean(ppr_per_game, na.rm = TRUE),
    average_starting_qb_salary = mean(cap_hit, na.rm = TRUE)
  )

# Make "League Average" observation
league_avg_observation <- data.frame(
  name = "League Average",
  position = "QB",
  total_tds = league_averages$average_touchdowns,
  yds_per_game = league_averages$average_yards_per_game,
  ppr_per_game = league_averages$average_fantasy_points_per_game,
  cap_hit = league_averages$average_starting_qb_salary,
  primary_color = "#000000"
)

mvp_data <- rbind(mvp_data, league_avg_observation)

# Fantasy  
ggplot(mvp_data, aes(y = cap_hit, x = ppr_per_game, label = name)) +
  geom_point(aes(color = primary_color), size = 3) +
  geom_text(vjust = -0.75, hjust = 0.25, size = 2.5) +
  scale_color_identity() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Salary vs. Fantasy Performance for NFL MVPs", 
          y = "Salary ($)", 
          x = "Fantasy Points per Game") + 
  theme_minimal()
       
```


### Yards per Game

```{r}
# Yards per game
ggplot(mvp_data, aes(y = cap_hit, x = yds_per_game, label = name)) +
  geom_point(aes(color = primary_color), size = 3) +
  geom_text(vjust = -0.75, hjust = 0.25, size = 2.5) +
  scale_color_identity() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Salary vs. Yards per Game for NFL MVPs", 
          y = "Salary ($)", 
          x = "Total Yards Per Game") + 
  theme_minimal()
```


### Total Touchdowns

```{r}
ggplot(mvp_data, aes(y = cap_hit, x = total_tds, label = name)) +
  geom_point(aes(color = primary_color), size = 3) +
  geom_text(vjust = -0.75, hjust = 0.25, size = 2.5) +
  scale_color_identity() +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Salary vs. Total TDs for NFL MVPs", 
          y = "Salary ($)", 
          x = "Total Touchdowns") + 
  theme_minimal()
```

### Analysis

In these plots, we see that, as expected, the player who has won MVP from 2014 to 2000 has far exceeded the league average for each of the stats. One thing to also note is that almost all of the MVPs were making less than the league average starting QB salary, indicating once again that many teams overspend on QBs and don't the expected value in return. One argument for why this is occurring is that because many of the QBs above are making lower than the league average, their teams have more money to give out to other positions which makes the team as a whole better. Even though MVP is supposed to be independent of a player's team, in practice it is clear that the Most Valuable Player award is often given to the best player on the best team in the league.

Wins and QB Salary
===

Column
---

### Winning Percentage and QB Salary

```{r}
# Filter for the years and for quarterbacks
qb_data <- value_data %>%
  filter(position == "QB")

# Find the highest-paid QB per team per year
highest_paid_qbs <- qb_data %>%
  group_by(team, season) %>%
  summarize(highest_salary = max(cap_hit), .groups = 'drop')

# Filter data for relevant years
relevant_scores <- weekly_scores %>%
  filter(schedule_season >= 2014, schedule_season <= 2020, schedule_playoff == FALSE)

# Calculate wins for each game
relevant_scores <- relevant_scores %>%
  mutate(home_win = ifelse(score_home > score_away, 1, 0),
         away_win = ifelse(score_away > score_home, 1, 0))

# Calculate win percentages by team and season
team_win_percentages <- relevant_scores %>%
  select(schedule_season, team_home, team_away, home_win, away_win) %>%
  mutate(team = team_home, win = home_win) %>%
  bind_rows(
    relevant_scores %>%
      select(schedule_season, team_home, team_away, home_win, away_win) %>%
      mutate(team = team_away, win = away_win)
  ) %>%
  group_by(team, schedule_season) %>%
  summarise(total_games = n(), wins = sum(win), .groups = 'drop') %>%
  mutate(win_percentage = wins / total_games)

# Assuming highest_paid_qbs is already prepared with columns: team, year, highest_salary
analysis_data <- merge(highest_paid_qbs, team_win_percentages, by.x = c("team", "season"), by.y = c("team", "schedule_season")) %>%
  left_join(team_color, by = "team")

# Scatterplot of team winning percentages vs. highest-paid QB salary
ggplot(analysis_data, aes(x = highest_salary, y = win_percentage)) +
  geom_point(aes(color = primary_color), size = 3, show.legend = FALSE) +
  geom_smooth(se = FALSE, color = "red") +
  scale_color_identity() +
  scale_x_continuous(labels = scales::dollar_format(scale = 1e-6, suffix = "M")) +
  labs(title = "Winning Percentage and Highest-Paid QB Salary (2014-2020)",
       x = "Highest Paid QB Salary (Millions USD)",
       y = "Team Winning Percentage") +
  theme_minimal()

```

Column
---

### Analysis

Here, we are looking how a team's winning percentage based on the amount they pay their starting Quarterback. The overall positive trend of the plot means that paying a QB more usually leads to a better winning percentage. This somewhat refutes what we saw earlier with MVP performance and salary, which showed that the league average salary for a starting QB was higher than the salaries for MVPs. My argument for that was that allocating more money for other positions would lead to more team success but this seems to state the opposite. What this plot suggests is what many people, including myself before starting this assignment, believe about the Quarterback position: that it is the most important to winning and is therefore the most valuable. 

Home Field Advantage
===

Column {data.width=750}
---

### Description

Here, we look at how home field affects winning percentage in the NFL. We will look at this both by showing teams winning games and how these percentages would be affected by factoring in betting spreads. This will also be able to tell us which teams have the best home field advantage. Since each team should play the same number of home and away games each season, home field advantage is a very important statistic.


### Home Field Advantage (Reg. Season)


```{r homefield}
# Calculate the win for each game
weekly_scores <- weekly_scores %>%
  mutate(home_win = ifelse(score_home > score_away, 1, 0),
         away_win = ifelse(score_away > score_home, 1, 0))

# Aggregate win data by home and away games
home_stats <- weekly_scores %>%
  group_by(team_home) %>%
  summarise(home_games = n(),
            home_wins = sum(home_win))

away_stats <- weekly_scores %>%
  group_by(team_away) %>%
  summarise(away_games = n(),
            away_wins = sum(away_win))

# Join home and away data
win_percentages <- full_join(home_stats, away_stats, by = c("team_home" = "team_away")) %>%
  mutate(home_win_percentage = home_wins / home_games,
         away_win_percentage = away_wins / away_games,
         win_percentage_difference = home_win_percentage - away_win_percentage)

team_colors <- data.frame(
  team_home = c("New England Patriots", "Kansas City Chiefs", "Green Bay Packers", "New Orleans Saints", 
                "Tampa Bay Buccaneers", "Buffalo Bills", "Baltimore Ravens", "Los Angeles Rams", 
                "Seattle Seahawks", "Pittsburgh Steelers", "Minnesota Vikings", "Indianapolis Colts", 
                "Philadelphia Eagles", "Tennessee Titans", "Dallas Cowboys", "San Francisco 49ers", 
                "Houston Texans", "Arizona Cardinals", "Chicago Bears", "Las Vegas Raiders", 
                "Cleveland Browns", "Atlanta Falcons", "Carolina Panthers", "Jacksonville Jaguars", 
                "New York Giants", "Los Angeles Chargers", "Cincinnati Bengals", "Denver Broncos", 
                "Miami Dolphins", "Washington Commanders", "New York Jets", "Detroit Lions"),
  primary_color = c("#002244", "#E31837", "#203731", "#D3BC8D", "#D50A0A", "#00338D", "#241773", 
                    "#003594", "#69BE28", "#FFB612", "#4F2683", "#003D79", "#004C54", "#2C135E", 
                    "#041E42", "#AA0000", "#03202F", "#97233F", "#0B162A", "#000000", 
                    "#FF3C00", "#A71930", "#0085CA", "#006778", "#FFA300", "#002244", "#FB4F14", 
                    "#FB4F14", "#008E97", "#773141", "#125740", "#0076B6")
)

# Ensure win_percentages_sorted is defined, and join with team colors
win_percentages_sorted <- win_percentages %>%
  arrange(desc(win_percentage_difference)) %>%
  left_join(team_colors, by = "team_home")

# Plot using ggplot
ggplot(win_percentages_sorted, aes(x = reorder(team_home, win_percentage_difference), 
                                   y = win_percentage_difference, 
                                   fill = primary_color)) +  
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_identity() +  
  labs(x = "NFL Team", 
       y = "Home Wins - Away Wins (%)",
       title = "Home Field Advantage (Regular Season)",
       title.cex = 0.9) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 65, hjust = 1))
```

Column
---

### Analysis

What this data shows is that, as expected, every team has a better record at home during the regular season since the year 2000. The reason for why some teams have a better could be due to a variety of factors. Teams such as the **Vikings**, **Seahawks**, and **Ravens** are known to have some of the best atmospheres and most dedicated fans in the league. In the playoffs, some of the reason for strange-looking stats are probably due to a smaller sample size. For example, the **Detroit Lions** have a 100% win percentage at home in the playoffs since 2000, but they have only played one home playoff game. Also, one outlier is that only one team hasn't played a home playoff game since 2000, and that team is the **Cleveland Browns**.

### Home Field Advantage (Playoffs)

```{r playoff homefield}
# Filter the weekly_scores data for only playoff games
playoff_scores <- weekly_scores %>%
  filter(schedule_playoff == TRUE)  # Ensure this column correctly identifies playoff games

# Calculate the win for each game
playoff_scores <- playoff_scores %>%
  mutate(home_win = ifelse(score_home > score_away, 1, 0),
         away_win = ifelse(score_away > score_home, 1, 0))

# Aggregate win data by home and away games
home_stats <- playoff_scores %>%
  group_by(team_home) %>%
  summarise(home_games = n(),
            home_wins = sum(home_win))

away_stats <- playoff_scores %>%
  group_by(team_away) %>%
  summarise(away_games = n(),
            away_wins = sum(away_win))

# Join home and away data
win_percentages_playoffs <- full_join(home_stats, away_stats, by = c("team_home" = "team_away")) %>%
  mutate(home_win_percentage = ifelse(home_games > 0, home_wins / home_games, NA),
         away_win_percentage = ifelse(away_games > 0, away_wins / away_games, NA),
         win_percentage_difference = home_win_percentage - away_win_percentage) %>%
  arrange(desc(win_percentage_difference)) %>%
  left_join(team_colors, by = "team_home")

# Filter out teams with zero home playoff games or where win percentage difference is NA
win_percentages_playoffs <- win_percentages_playoffs %>%
  filter(!is.na(win_percentage_difference) & home_games > 0)

# Plot using ggplot
ggplot(win_percentages_playoffs, aes(x = reorder(team_home, win_percentage_difference), 
                                   y = win_percentage_difference, 
                                   fill = primary_color)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_identity() +
  labs(x = "NFL Team", 
       y = "Home Wins - Away Wins (%)",
       title = "Home Field Advantage (Playoffs)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 65, hjust = 1))
```




Conclusion
===

Column
---

### Conclusion

After looking over the entire presentation, a few things stand out.

 - **Player success doesn't necessarily directly depend on salary, but it is often a good indicator of a player's ability.** Since salaries for players are usually highly competitive between multiple teams, it would make sense why some players would have a high salary but a worse performance for a given year. On the flip side, rookies can only make a capped amount of money for usually 3 to 4 years depending on when they were drafted. This means that by selecting a player later in the draft that has a great output in their first few years, a team can get a lot of value in a player while spending a small amount of money.
 
 - **Team success can be shwon to be directly impacted by a QB's salary.** Like I previously mentioned, QB's are the most valuable position on a football field so it makes sense that they have the highest impact on team success and their salaries represent that.
 
 - **Teams usually get what they pay for with WRs and TEs.** These two positions have become used more and more in recent years with the West Coast style offense beginning to emerge in the NFL. As time goes on, I predeict that these positions salaries will become highly inflated.
 
 - **Regular Season home field advantage doesn't translate to the Playoffs.** A common adage in professional sports is that the regular season and the postseason are two completely different games. With the bar charts we have looked at, that seems to be the case. I would have expected that teams with a home winning record that is much higher in the regular season would also translate to the playoffs but that does not seem to be the case. Whether simply due to a smaller sample size or maybe a damped effect of home field advantage in the playoffs, it is unclear why teams that perform well at home during the regular season, don't always do as well in the postseason.
 
Column
--- 

### Difficulties

During this project, I ran into several challenges. Some of these were likely avoidable and some of them weren't. For one, the many team name changes and moving locations made it a struggle to correlate some of these dataframes. I was unsure why I couldn't use the "name" variable as a key for a couple different data.frames and it took me a long time to realize why. This could have been solved much earlier by printing all the different team names in each set of data and then adjusting them until I was left with the correct number. Another issue is that the amount of data available for some of these stats can be very hard to find. I was able to find every score for every NFL game in history within minutes, but I searched for days to find the salaries of all NFL players only to find a set that ranged from 2014 - 2020. This left a lot of gaps in the data because I had far too much data in some places but not others which meant I couldn't go into everything I originally wanted to.


### About the Author

My name is Riley Barrett and at the time of this presentation I will be graduating in less than a week. I chose to do this presentation on this topic because I was interested in seeing some new things about the NFL that I hadn't really thought about before. I have always been a big fan of the Cleveland Browns, which led me to this project initially since they have been a franchise with little success in the past 30 years. 

I have worked at the Wright-Patterson AFB for almost 2 years where I have done data manipulation on all different kinds of data. 

I took this class because I wanted to learn more about different ways to display and manipulate large data sets.

In the next month, I will begin a career of electrical engineering at a firm in Cleveland, OH where I can hopefully use what I learned in this class to benefit myself and my company.